an R package for data visualization
stands for “grammar of graphics”
Popular (well-supported, great community)
Open source (like all of R)
Easy to use (after a learning curve)
Aesthetically pleasing
Built for multi-variate data
Reproducible figures
You can make anything!
http://spatial.ly/2013/12/introduction-spatial-data-ggplot2/
http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/
https://twitter.com/bettermeasured/status/885956556046643200
Teach you enough that you know how to teach yourself more!
Teach you enough that you know how to teach yourself more!
Introduce “grammar of graphics” structure
Send you away with a list of resouces (including these slides)
Grammar of graphics
ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point() +
facet_grid(color ~ cut)Download scripts from the following site: https://is.gd/IYoXwA
Open RStudio.
In RStudio, open “install_packages.R”. Highlight the text and click “Run”.
Still in RStudio, open “workshop_script.R”. We’ll work from this for the rest of the presentation.
Load packages with the “library” commands at the top of the script.
First, we need data.
Let’s use the built-in R dataset, “diamonds”.
| carat | cut | color | clarity | depth | table | price | x | y | z |
|---|---|---|---|---|---|---|---|---|---|
| 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
| 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
| 0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
| 0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
| 0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
| 0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
ggplot(data = diamonds)ggplot(data = diamonds,
aes(x = carat, y = price))ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point()ggplot(data = diamonds,
aes(x = carat, y = price, color = clarity)) +
geom_point()ggplot(data = diamonds,
aes(x = carat, y = price), color = clarity) +
geom_point()ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(color = "magenta")(with a small difference:)
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity))ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth()ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2)ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2) +
theme_few()ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2) +
facet_wrap(~cut)ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point(size = 0.5) +
facet_grid(color~cut)ColorBrewer is useful and popular:
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
theme_few() +
scale_color_brewer(type = "qual", palette = "Set2")ggplot(data = diamonds,
aes(x = clarity, y = price)) +
geom_violin() ggplot(data = diamonds,
aes(x = price, y = cut)) +
geom_joy()ggplot(data = diamonds,
aes(x = price, y = cut, color = cut, fill = cut)) +
geom_joy(alpha = 0.8, scale = 5) +
scale_fill_viridis(option = "A", discrete = TRUE) +
scale_color_viridis(option = "A", discrete = TRUE) +
theme_few()ggplot(data = diamonds,
aes(x = price, y = cut, fill = cut)) +
geom_joy(alpha = 1, scale = 5) +
scale_fill_manual(values = wes_palette("Darjeeling")) +
theme_few()kable(head(iris))| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
iris_long <- melt(iris, id.vars = ("Species"))
kable(head(iris_long))| Species | variable | value |
|---|---|---|
| setosa | Sepal.Length | 5.1 |
| setosa | Sepal.Length | 4.9 |
| setosa | Sepal.Length | 4.7 |
| setosa | Sepal.Length | 4.6 |
| setosa | Sepal.Length | 5.0 |
| setosa | Sepal.Length | 5.4 |
ggplot(iris_long, aes(x = Species, y = value, fill = variable)) +
geom_bar(stat = 'identity', width = 1) +
theme_bw()ggplot(iris_long,
aes(x = Species, y = value, color = variable, fill = variable)) +
geom_bar(stat = 'identity', width = 1) +
coord_polar(theta = 'x') +
theme_bw()kable(head(gapminder))| country | continent | year | lifeExp | pop | gdpPercap |
|---|---|---|---|---|---|
| Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 |
| Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 |
| Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 |
| Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 |
| Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 |
| Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 |
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point()Who’s the outlier?
gapminder %>% filter(gdpPercap > 60000)## # A tibble: 5 x 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Kuwait Asia 1952 55.565 160000 108382.35
## 2 Kuwait Asia 1957 58.033 212846 113523.13
## 3 Kuwait Asia 1962 60.470 358266 95458.11
## 4 Kuwait Asia 1967 64.624 575003 80894.88
## 5 Kuwait Asia 1972 67.712 841934 109347.87
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_hex()ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_density2d(aes(color = ..level..), bins = 20) +
scale_color_viridis()p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(aes(color = continent, size = pop), alpha = 0.8) +
scale_x_continuous(trans = 'log') +
facet_wrap(~year) +
scale_color_brewer(type = "Qual", palette = "Accent") +
theme_hc(bgc = 'darkunica') +
theme(text = element_text(size = 9))pPrepare data:
country_df <- map_data('world') %>%
rename("country" = "region")
country_df$country[country_df$country == "USA"] <- "United States"
#Take the mean across all years for each country:
gapminder_means <- gapminder %>%
group_by(country, continent) %>%
summarise(lifeExp = mean(lifeExp),
pop = mean(pop),
gdpPercap = mean(gdpPercap))
plot_dat <- left_join(gapminder_means, country_df, by = "country")ggplot(plot_dat) +
geom_polygon(aes(x = long, y = lat, fill = lifeExp, group = group)) +
scale_fill_viridis(option = "A") +
coord_quickmap() +
theme_few()What questions could we ask with this data?
How could we visually answer those questions?
Thanks!
Adrienne Marshall mars7850@vandals.uidaho.edu